Overview

Dataset statistics

Number of variables14
Number of observations10296
Missing cells389
Missing cells (%)0.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.1 MiB
Average record size in memory112.0 B

Variable types

Numeric11
Unsupported1
Categorical2

Warnings

CustMonVal is highly correlated with ClaimsRateHigh correlation
ClaimsRate is highly correlated with CustMonValHigh correlation
PremLife has 104 (1.0%) missing values Missing
FirstPolYear is highly skewed (γ1 = 101.2958498) Skewed
CustMonVal is highly skewed (γ1 = -67.04273979) Skewed
ClaimsRate is highly skewed (γ1 = 71.20947447) Skewed
PremMotor is highly skewed (γ1 = 23.87096035) Skewed
PremHousehold is highly skewed (γ1 = 36.05402336) Skewed
PremHealth is highly skewed (γ1 = 84.51949178) Skewed
CustID is uniformly distributed Uniform
CustID has unique values Unique
EducDeg is an unsupported type, check if it needs cleaning or further analysis Unsupported

Reproduction

Analysis started2022-01-15 22:46:31.047617
Analysis finished2022-01-15 22:47:11.476815
Duration40.43 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

CustID
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct10296
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5148.5
Minimum1
Maximum10296
Zeros0
Zeros (%)0.0%
Memory size80.6 KiB
2022-01-15T22:47:11.648710image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile515.75
Q12574.75
median5148.5
Q37722.25
95-th percentile9781.25
Maximum10296
Range10295
Interquartile range (IQR)5147.5

Descriptive statistics

Standard deviation2972.34352
Coefficient of variation (CV)0.5773222336
Kurtosis-1.2
Mean5148.5
Median Absolute Deviation (MAD)2574
Skewness0
Sum53008956
Variance8834826
MonotocityStrictly increasing
2022-01-15T22:47:11.849292image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11
 
< 0.1%
69181
 
< 0.1%
68601
 
< 0.1%
68611
 
< 0.1%
68621
 
< 0.1%
68631
 
< 0.1%
68641
 
< 0.1%
68651
 
< 0.1%
68661
 
< 0.1%
68671
 
< 0.1%
Other values (10286)10286
99.9%
ValueCountFrequency (%)
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
ValueCountFrequency (%)
102961
< 0.1%
102951
< 0.1%
102941
< 0.1%
102931
< 0.1%
102921
< 0.1%

FirstPolYear
Real number (ℝ≥0)

SKEWED

Distinct26
Distinct (%)0.3%
Missing30
Missing (%)0.3%
Infinite0
Infinite (%)0.0%
Mean1991.062634
Minimum1974
Maximum53784
Zeros0
Zeros (%)0.0%
Memory size80.6 KiB
2022-01-15T22:47:12.037823image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1974
5-th percentile1976
Q11980
median1986
Q31992
95-th percentile1996
Maximum53784
Range51810
Interquartile range (IQR)12

Descriptive statistics

Standard deviation511.2679127
Coefficient of variation (CV)0.2567814312
Kurtosis10262.56551
Mean1991.062634
Median Absolute Deviation (MAD)6
Skewness101.2958498
Sum20440249
Variance261394.8786
MonotocityNot monotonic
2022-01-15T22:47:12.194596image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=26)
ValueCountFrequency (%)
1988512
 
5.0%
1994475
 
4.6%
1993473
 
4.6%
1989466
 
4.5%
1984464
 
4.5%
1986458
 
4.4%
1977453
 
4.4%
1978453
 
4.4%
1992451
 
4.4%
1990449
 
4.4%
Other values (16)5612
54.5%
ValueCountFrequency (%)
1974141
 
1.4%
1975285
2.8%
1976433
4.2%
1977453
4.4%
1978453
4.4%
ValueCountFrequency (%)
537841
 
< 0.1%
1998112
 
1.1%
1997271
2.6%
1996440
4.3%
1995445
4.3%

BirthYear
Real number (ℝ≥0)

Distinct68
Distinct (%)0.7%
Missing17
Missing (%)0.2%
Infinite0
Infinite (%)0.0%
Mean1968.007783
Minimum1028
Maximum2001
Zeros0
Zeros (%)0.0%
Memory size80.6 KiB
2022-01-15T22:47:12.366925image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1028
5-th percentile1941
Q11953
median1968
Q31983
95-th percentile1995
Maximum2001
Range973
Interquartile range (IQR)30

Descriptive statistics

Standard deviation19.70947624
Coefficient of variation (CV)0.01001493816
Kurtosis501.8127652
Mean1968.007783
Median Absolute Deviation (MAD)15
Skewness-10.53673466
Sum20229152
Variance388.4634537
MonotocityNot monotonic
2022-01-15T22:47:12.555410image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1962206
 
2.0%
1968200
 
1.9%
1964194
 
1.9%
1953193
 
1.9%
1981190
 
1.8%
1990189
 
1.8%
1974187
 
1.8%
1951186
 
1.8%
1952186
 
1.8%
1963186
 
1.8%
Other values (58)8362
81.2%
ValueCountFrequency (%)
10281
 
< 0.1%
193514
 
0.1%
193637
0.4%
193757
0.6%
193877
0.7%
ValueCountFrequency (%)
200112
 
0.1%
200035
 
0.3%
199969
0.7%
199894
0.9%
1997133
1.3%

EducDeg
Unsupported

REJECTED
UNSUPPORTED

Missing17
Missing (%)0.2%
Memory size80.6 KiB

MonthSal
Real number (ℝ≥0)

Distinct3565
Distinct (%)34.7%
Missing36
Missing (%)0.3%
Infinite0
Infinite (%)0.0%
Mean2506.667057
Minimum333
Maximum55215
Zeros0
Zeros (%)0.0%
Memory size80.6 KiB
2022-01-15T22:47:12.744140image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum333
5-th percentile933.9
Q11706
median2501.5
Q33290.25
95-th percentile4046
Maximum55215
Range54882
Interquartile range (IQR)1584.25

Descriptive statistics

Standard deviation1157.449634
Coefficient of variation (CV)0.4617484524
Kurtosis474.3813076
Mean2506.667057
Median Absolute Deviation (MAD)791.5
Skewness11.25083378
Sum25718404
Variance1339689.656
MonotocityNot monotonic
2022-01-15T22:47:12.927347image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
320010
 
0.1%
139810
 
0.1%
377610
 
0.1%
268710
 
0.1%
356010
 
0.1%
23089
 
0.1%
17669
 
0.1%
35689
 
0.1%
29599
 
0.1%
20739
 
0.1%
Other values (3555)10165
98.7%
(Missing)36
 
0.3%
ValueCountFrequency (%)
3333
< 0.1%
3341
 
< 0.1%
3353
< 0.1%
3361
 
< 0.1%
3401
 
< 0.1%
ValueCountFrequency (%)
552151
< 0.1%
344901
< 0.1%
50211
< 0.1%
49951
< 0.1%
49041
< 0.1%

GeoLivArea
Categorical

Distinct4
Distinct (%)< 0.1%
Missing1
Missing (%)< 0.1%
Memory size80.6 KiB
4.0
4145 
1.0
3048 
3.0
2066 
2.0
1036 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters30885
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row4.0
3rd row3.0
4th row4.0
5th row4.0
ValueCountFrequency (%)
4.04145
40.3%
1.03048
29.6%
3.02066
20.1%
2.01036
 
10.1%
(Missing)1
 
< 0.1%
2022-01-15T22:47:13.262833image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2022-01-15T22:47:13.356645image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
4.04145
40.3%
1.03048
29.6%
3.02066
20.1%
2.01036
 
10.1%

Most occurring characters

ValueCountFrequency (%)
.10295
33.3%
010295
33.3%
44145
13.4%
13048
 
9.9%
32066
 
6.7%
21036
 
3.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number20590
66.7%
Other Punctuation10295
33.3%

Most frequent character per category

ValueCountFrequency (%)
010295
50.0%
44145
20.1%
13048
 
14.8%
32066
 
10.0%
21036
 
5.0%
ValueCountFrequency (%)
.10295
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common30885
100.0%

Most frequent character per script

ValueCountFrequency (%)
.10295
33.3%
010295
33.3%
44145
13.4%
13048
 
9.9%
32066
 
6.7%
21036
 
3.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII30885
100.0%

Most frequent character per block

ValueCountFrequency (%)
.10295
33.3%
010295
33.3%
44145
13.4%
13048
 
9.9%
32066
 
6.7%
21036
 
3.4%

Children
Categorical

Distinct2
Distinct (%)< 0.1%
Missing21
Missing (%)0.2%
Memory size80.6 KiB
1.0
7262 
0.0
3013 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters30825
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row0.0
4th row1.0
5th row1.0
ValueCountFrequency (%)
1.07262
70.5%
0.03013
29.3%
(Missing)21
 
0.2%
2022-01-15T22:47:13.670220image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2022-01-15T22:47:13.764463image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
1.07262
70.7%
0.03013
29.3%

Most occurring characters

ValueCountFrequency (%)
013288
43.1%
.10275
33.3%
17262
23.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number20550
66.7%
Other Punctuation10275
33.3%

Most frequent character per category

ValueCountFrequency (%)
013288
64.7%
17262
35.3%
ValueCountFrequency (%)
.10275
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common30825
100.0%

Most frequent character per script

ValueCountFrequency (%)
013288
43.1%
.10275
33.3%
17262
23.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII30825
100.0%

Most frequent character per block

ValueCountFrequency (%)
013288
43.1%
.10275
33.3%
17262
23.6%

CustMonVal
Real number (ℝ)

HIGH CORRELATION
SKEWED

Distinct7012
Distinct (%)68.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean177.8926049
Minimum-165680.42
Maximum11875.89
Zeros2
Zeros (%)< 0.1%
Memory size80.6 KiB
2022-01-15T22:47:13.889946image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-165680.42
5-th percentile-93.4875
Q1-9.44
median186.87
Q3399.7775
95-th percentile645.9325
Maximum11875.89
Range177556.31
Interquartile range (IQR)409.2175

Descriptive statistics

Standard deviation1945.811505
Coefficient of variation (CV)10.93812475
Kurtosis5323.18296
Mean177.8926049
Median Absolute Deviation (MAD)202.155
Skewness-67.04273979
Sum1831582.26
Variance3786182.414
MonotocityNot monotonic
2022-01-15T22:47:14.078656image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-25272
 
2.6%
-3112
 
0.1%
-3711
 
0.1%
-3511
 
0.1%
-15.1110
 
0.1%
-12.3310
 
0.1%
-47.6710
 
0.1%
-33.899
 
0.1%
-10.339
 
0.1%
-21.119
 
0.1%
Other values (7002)9933
96.5%
ValueCountFrequency (%)
-165680.421
< 0.1%
-648911
< 0.1%
-52382.761
< 0.1%
-37327.081
< 0.1%
-28945.41
< 0.1%
ValueCountFrequency (%)
11875.891
< 0.1%
5596.841
< 0.1%
4328.51
< 0.1%
2314.211
< 0.1%
2054.071
< 0.1%

ClaimsRate
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct165
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.7427719503
Minimum0
Maximum256.2
Zeros58
Zeros (%)0.6%
Memory size80.6 KiB
2022-01-15T22:47:14.267158image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.16
Q10.39
median0.72
Q30.98
95-th percentile1.1
Maximum256.2
Range256.2
Interquartile range (IQR)0.59

Descriptive statistics

Standard deviation2.916963637
Coefficient of variation (CV)3.927132192
Kurtosis5877.806759
Mean0.7427719503
Median Absolute Deviation (MAD)0.28
Skewness71.20947447
Sum7647.58
Variance8.508676861
MonotocityNot monotonic
2022-01-15T22:47:14.455967image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1457
 
4.4%
1.01212
 
2.1%
1.02203
 
2.0%
0.99198
 
1.9%
1.03195
 
1.9%
0.98173
 
1.7%
0.97148
 
1.4%
0.95143
 
1.4%
1.04140
 
1.4%
0.91139
 
1.4%
Other values (155)8288
80.5%
ValueCountFrequency (%)
058
0.6%
0.012
 
< 0.1%
0.034
 
< 0.1%
0.045
 
< 0.1%
0.058
 
0.1%
ValueCountFrequency (%)
256.21
< 0.1%
961
< 0.1%
691
< 0.1%
631
< 0.1%
351
< 0.1%

PremMotor
Real number (ℝ)

SKEWED

Distinct1950
Distinct (%)19.0%
Missing34
Missing (%)0.3%
Infinite0
Infinite (%)0.0%
Mean300.4702524
Minimum-4.11
Maximum11604.42
Zeros0
Zeros (%)0.0%
Memory size80.6 KiB
2022-01-15T22:47:14.632805image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-4.11
5-th percentile63.68
Q1190.59
median298.61
Q3408.3
95-th percentile515.32
Maximum11604.42
Range11608.53
Interquartile range (IQR)217.71

Descriptive statistics

Standard deviation211.914997
Coefficient of variation (CV)0.7052777947
Kurtosis1096.286508
Mean300.4702524
Median Absolute Deviation (MAD)108.69
Skewness23.87096035
Sum3083425.73
Variance44907.96595
MonotocityNot monotonic
2022-01-15T22:47:14.820819image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
398.7417
 
0.2%
361.2916
 
0.2%
246.4916
 
0.2%
206.1515
 
0.1%
279.6115
 
0.1%
409.5214
 
0.1%
269.9413
 
0.1%
346.5113
 
0.1%
312.0613
 
0.1%
210.2613
 
0.1%
Other values (1940)10117
98.3%
(Missing)34
 
0.3%
ValueCountFrequency (%)
-4.111
< 0.1%
1.781
< 0.1%
3.782
< 0.1%
4.781
< 0.1%
6.781
< 0.1%
ValueCountFrequency (%)
11604.421
< 0.1%
8744.611
< 0.1%
5645.51
< 0.1%
4273.491
< 0.1%
4003.441
< 0.1%

PremHousehold
Real number (ℝ)

SKEWED

Distinct1061
Distinct (%)10.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean210.4311917
Minimum-75
Maximum25048.8
Zeros60
Zeros (%)0.6%
Memory size80.6 KiB
2022-01-15T22:47:15.008990image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-75
5-th percentile-30
Q149.45
median132.8
Q3290.05
95-th percentile695.7
Maximum25048.8
Range25123.8
Interquartile range (IQR)240.6

Descriptive statistics

Standard deviation352.595984
Coefficient of variation (CV)1.675588021
Kurtosis2427.155944
Mean210.4311917
Median Absolute Deviation (MAD)103.35
Skewness36.05402336
Sum2166599.55
Variance124323.928
MonotocityNot monotonic
2022-01-15T22:47:15.197471image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
39.4562
 
0.6%
060
 
0.6%
-45.5560
 
0.6%
19.4560
 
0.6%
-30.5557
 
0.6%
-5.5556
 
0.5%
69.4555
 
0.5%
34.4554
 
0.5%
44.4554
 
0.5%
-40.5553
 
0.5%
Other values (1051)9725
94.5%
ValueCountFrequency (%)
-7518
0.2%
-7034
0.3%
-6535
0.3%
-6031
0.3%
-5527
0.3%
ValueCountFrequency (%)
25048.81
< 0.1%
8762.81
< 0.1%
4130.71
< 0.1%
2223.751
< 0.1%
1957.61
< 0.1%

PremHealth
Real number (ℝ)

SKEWED

Distinct1006
Distinct (%)9.8%
Missing43
Missing (%)0.4%
Infinite0
Infinite (%)0.0%
Mean171.5808329
Minimum-2.11
Maximum28272
Zeros0
Zeros (%)0.0%
Memory size80.6 KiB
2022-01-15T22:47:15.370313image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-2.11
5-th percentile54.12
Q1111.8
median162.81
Q3219.82
95-th percentile297.39
Maximum28272
Range28274.11
Interquartile range (IQR)108.02

Descriptive statistics

Standard deviation296.4059761
Coefficient of variation (CV)1.727500508
Kurtosis7914.203507
Mean171.5808329
Median Absolute Deviation (MAD)54.01
Skewness84.51949178
Sum1759218.28
Variance87856.50266
MonotocityNot monotonic
2022-01-15T22:47:15.533414image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
130.4730
 
0.3%
178.729
 
0.3%
159.1428
 
0.3%
158.1427
 
0.3%
147.3626
 
0.3%
112.9126
 
0.3%
169.726
 
0.3%
136.5825
 
0.2%
151.0325
 
0.2%
121.5825
 
0.2%
Other values (996)9986
97.0%
(Missing)43
 
0.4%
ValueCountFrequency (%)
-2.111
< 0.1%
5.781
< 0.1%
7.781
< 0.1%
11.671
< 0.1%
12.671
< 0.1%
ValueCountFrequency (%)
282721
< 0.1%
7322.481
< 0.1%
17671
< 0.1%
1045.521
< 0.1%
442.861
< 0.1%

PremLife
Real number (ℝ)

MISSING

Distinct611
Distinct (%)6.0%
Missing104
Missing (%)1.0%
Infinite0
Infinite (%)0.0%
Mean41.85578199
Minimum-7
Maximum398.3
Zeros0
Zeros (%)0.0%
Memory size80.6 KiB
2022-01-15T22:47:15.715347image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-7
5-th percentile-1.11
Q19.89
median25.56
Q357.79
95-th percentile140.36
Maximum398.3
Range405.3
Interquartile range (IQR)47.9

Descriptive statistics

Standard deviation47.480632
Coefficient of variation (CV)1.134386452
Kurtosis5.716367231
Mean41.85578199
Median Absolute Deviation (MAD)19.56
Skewness2.089846133
Sum426594.13
Variance2254.410416
MonotocityNot monotonic
2022-01-15T22:47:15.903834image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9.89130
 
1.3%
3.89121
 
1.2%
0.89119
 
1.2%
-1.11117
 
1.1%
12.89109
 
1.1%
6.89107
 
1.0%
5.89106
 
1.0%
4.89106
 
1.0%
7.89102
 
1.0%
1.89101
 
1.0%
Other values (601)9074
88.1%
(Missing)104
 
1.0%
ValueCountFrequency (%)
-765
0.6%
-664
0.6%
-574
0.7%
-456
0.5%
-372
0.7%
ValueCountFrequency (%)
398.31
< 0.1%
365.181
< 0.1%
363.291
< 0.1%
354.41
< 0.1%
346.41
< 0.1%

PremWork
Real number (ℝ)

Distinct898
Distinct (%)8.8%
Missing86
Missing (%)0.8%
Infinite0
Infinite (%)0.0%
Mean41.2775142
Minimum-12
Maximum1988.7
Zeros0
Zeros (%)0.0%
Memory size80.6 KiB
2022-01-15T22:47:16.123565image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-12
5-th percentile-4
Q110.67
median25.67
Q356.79
95-th percentile137.3105
Maximum1988.7
Range2000.7
Interquartile range (IQR)46.12

Descriptive statistics

Standard deviation51.51357235
Coefficient of variation (CV)1.247981458
Kurtosis212.7789142
Mean41.2775142
Median Absolute Deviation (MAD)19.45
Skewness7.43811547
Sum421443.42
Variance2653.648136
MonotocityNot monotonic
2022-01-15T22:47:16.622639image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10.8976
 
0.7%
9.8972
 
0.7%
11.8970
 
0.7%
-5.1168
 
0.7%
-0.1166
 
0.6%
1.8965
 
0.6%
16.8965
 
0.6%
3.8964
 
0.6%
14.8964
 
0.6%
4.8963
 
0.6%
Other values (888)9537
92.6%
(Missing)86
 
0.8%
ValueCountFrequency (%)
-1234
0.3%
-1132
0.3%
-1032
0.3%
-929
0.3%
-851
0.5%
ValueCountFrequency (%)
1988.71
< 0.1%
930.441
< 0.1%
494.11
< 0.1%
451.531
< 0.1%
417.081
< 0.1%

Interactions

2022-01-15T22:46:47.672405image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:47.907655image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:48.088738image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:48.292848image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:48.512637image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:48.684886image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:48.873301image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:49.062019image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:49.250546image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:49.439027image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:49.705536image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:49.972308image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:50.192052image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:50.416886image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:50.631796image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:50.836183image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:51.056395image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:51.229266image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:51.401897image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:51.590387image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:51.794939image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:51.982776image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:52.155419image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:52.344157image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:52.533092image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:52.705454image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:52.878467image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:53.082397image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:53.272165image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:53.477715image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:53.800085image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:54.009068image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:54.192793image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:54.396758image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:54.569667image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:54.757608image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:54.930434image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:55.103156image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:55.276041image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:55.464540image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:55.652985image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:55.829389image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:56.013831image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:56.217889image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:56.391059image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:56.594535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:56.798644image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:57.013445image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:57.201930image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:57.405426image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:57.718556image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:57.930903image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:58.124454image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:58.328766image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:58.520270image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:58.736362image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:58.944359image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:59.152799image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:59.349650image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:59.553781image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:59.757890image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:46:59.946295image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:00.150793image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:00.347811image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:00.591487image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:00.847208image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:01.074573image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:01.477198image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:01.677299image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:01.878224image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:02.082270image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:02.270780image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:02.443645image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:02.631850image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:02.804763image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:03.009085image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:03.181142image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:03.354021image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:03.542512image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:03.730761image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:03.959171image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:04.170246image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:04.343112image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:04.505841image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:04.672542image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:04.861021image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:05.033877image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:05.207781image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:05.363478image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:05.551951image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:05.748911image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:05.947110image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:06.145094image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:06.339446image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:06.527702image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:06.731591image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:06.935623image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:07.124071image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:07.312771image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:07.500778image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:07.704875image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:07.893358image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:08.081841image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:08.270681image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:08.459213image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:08.663272image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:08.851757image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:09.056090image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:09.276200image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-01-15T22:47:09.464682image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2022-01-15T22:47:16.836929image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-01-15T22:47:17.156994image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-01-15T22:47:17.486545image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-01-15T22:47:17.800478image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2022-01-15T22:47:18.115537image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2022-01-15T22:47:10.093760image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-01-15T22:47:10.565537image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-01-15T22:47:10.957222image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-01-15T22:47:11.271920image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

CustIDFirstPolYearBirthYearEducDegMonthSalGeoLivAreaChildrenCustMonValClaimsRatePremMotorPremHouseholdPremHealthPremLifePremWork
01.01985.01982.0b'2 - High School'2177.01.01.0380.970.39375.8579.45146.3647.0116.89
12.01981.01995.0b'2 - High School'677.04.01.0-131.131.1277.46416.20116.69194.48106.13
23.01991.01970.0b'1 - Basic'2277.03.00.0504.670.28206.15224.50124.5886.3599.02
34.01990.01981.0b'3 - BSc/MSc'1099.04.01.0-16.990.99182.4843.35311.1735.3428.34
45.01986.01973.0b'3 - BSc/MSc'1763.04.01.035.230.90338.6247.80182.5918.7841.45
56.01986.01956.0b'2 - High School'2566.04.01.0-24.331.00440.7518.90114.807.007.67
67.01979.01943.0b'2 - High School'4103.04.00.0-66.011.05156.92295.60317.9514.6726.34
78.01988.01974.0b'2 - High School'1743.04.01.0-144.911.13248.27397.30144.3666.6853.23
89.01981.01978.0b'3 - BSc/MSc'1862.01.01.0356.530.36344.5118.35210.048.789.89
910.01976.01948.0b'3 - BSc/MSc'3842.01.00.0-119.351.12209.26182.25271.9439.2355.12

Last rows

CustIDFirstPolYearBirthYearEducDegMonthSalGeoLivAreaChildrenCustMonValClaimsRatePremMotorPremHouseholdPremHealthPremLifePremWork
1028610287.01997.01943.0b'3 - BSc/MSc'3975.02.00.0220.270.62285.6177.25241.4931.458.89
1028710288.01996.01941.0b'2 - High School'3845.04.00.099.470.9087.35843.50121.58157.9233.45
1028810289.01982.01993.0b'2 - High School'1465.01.01.0795.150.3567.79820.15102.13182.4886.46
1028910290.01986.01943.0b'2 - High School'3498.04.00.0245.600.67227.82270.60160.92100.1369.90
1029010291.01994.01999.0b'1 - Basic'626.03.01.0176.260.856.89878.50103.13113.02201.26
1029110292.01984.01949.0b'4 - PhD'3188.02.00.0-0.110.96393.7449.45173.819.7814.78
1029210293.01977.01952.0b'1 - Basic'2431.03.00.01405.600.00133.581035.75143.2512.89105.13
1029310294.01994.01976.0b'3 - BSc/MSc'2918.01.01.0524.100.21403.63132.80142.2512.674.89
1029410295.01981.01977.0b'1 - Basic'1971.02.01.0250.050.65188.59211.15198.3763.90112.91
1029510296.01990.01981.0b'4 - PhD'2815.01.01.0463.750.27414.0894.45141.256.8912.89